Regionalized Policy Representation for Reinforcement Learning in POMDPs

نویسندگان

Xuejun Liao

Hui Li

Ronald Parr

Lawrence Carin

چکیده

Many decision-making problems can be formulated in the framework of a partially observable Markov decision process (POMDP) [5]. The optimality of decisions relies on the accuracy of the POMDP model as well as the policy found for the model. In many applications the model is unknown and learned empirically based on experience, and building a model is just as difficult as finding the associated policy. Since the ultimate goal of decision making is the optimal policy, it is advantageous to learn an optimal policy directly from experience, without an intervening stage of model learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Infinite Regionalized Policy Representation-0.1cm

We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while main...

متن کامل

Deep Reinforcement Learning with POMDPs

Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari problem as a partially observable Markov decision process (POMDP) by adding imperfect state information through image flickering [2]. However, these approaches leverage a convolutional network structure...

متن کامل

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distr...

متن کامل

Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs

In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in pract...

متن کامل

Proposal for an Algorithm to Improve a Rational Policy in POMDPs

Reinforcement learning is a kind of machine learning. Partially Observable Markov Decision Process (POMDP) is a representative class of non-Markovian environments in reinforcemnet learning. We know the Rational Policy Making algorithm (RPM) to learn a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Regionalized Policy Representation for Reinforcement Learning in POMDPs

نویسندگان

چکیده

منابع مشابه

The Infinite Regionalized Policy Representation-0.1cm

Deep Reinforcement Learning with POMDPs

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs

Proposal for an Algorithm to Improve a Rational Policy in POMDPs

عنوان ژورنال:

اشتراک گذاری